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A Note on the Importance of Weak Convergence 
Rates for SPDE Approximations in Multilevel 
Monte Carlo Schemes 


Annika Lang 


Abstract It is a well-known rule of thumb that approximations of stochastic partial 
differential equations have essentially twice the order of weak convergence compared 
to the corresponding order of strong convergence. This is already known for many 
approximations of stochastic (ordinary) differential equations while it is recent 
research for stochastic partial differential equations. In this note it is shown how 
the availability of weak convergence results influences the number of samples in 
multilevel Monte Carlo schemes and therefore reduces the computational complexity 
of these schemes for a given accuracy of the approximations. 


1 Introduction 

Since the publication of Giles’ articles about multilevel Monte Carlo methods [8, 9], 
which applied an earlier idea of Heinrich [10] to stochastic differential equations, an 
enormous amount of literature on the application of multilevel Monte Carlo schemes 
to various applications has been published. For an overview of the state of the art 
in the area, the reader is referred to the scientific program and the proceedings of 
MCQMC14 in Leuven. 

This note is intended to show the consequences of the availability of different 
types of convergence results for stochastic partial differential equations of Ito type 
(SPDEs for short in what follows). Here we consider so called strong and weak 
convergence rates, where a sequence of approximations (Yg,£ £ No) of a //-valued 
random variable Y converges strongly (also called in mean square) to Y if 

lim E[\\Y-Y e \\ 2 H ] l ' 2 =0. 
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In the context of this note, H denotes a separable Hilbert space. The sequence is said 
to converge weakly to Y if 


lim |E[<p(T)]—E[<p(7,)]|=0 

for (p in an appropriately chosen class of functionals that depends in general on 
the treated problem. While strong convergence results for approximations of many 
SPDEs are already well-known, corresponding orders of weak convergence that 
are better than the strong ones are just rarely available. For an overview on the 
existing literature on weak convergence, the reader is referred to [1, 11] and the 
literature therein. The necessity to do further research in this area is besides other 
motivations also due to the efficiency of multilevel Monte Carlo approximations, 
which is the content of this note. By a rule of thumb one expects the order of 
weak convergence to be twice the strong one for SPDEs. This is shown under 
certain smoothness assumptions on the SPDE and its approximation in [1]. We 
use the SPDE from [1] and its approximations with the desired strong and weak 
convergence rates to show that the additional knowledge of better weak than strong 
convergence rates changes the choices of the number of samples per level in a 
multilevel Monte Carlo approximation according to the theory. Since, for a given 
accuracy, the number of samples reduces with the availability of weak rates, the 
overall computational work decreases. Computing numbers, we shall see in the end 
that for high dimensional problems and low regularity of the original SPDE the work 
using only strong approximation results is essentially the squared work using also 
weak approximation rates. In other words the order of the complexity of the work in 
terms of accuracy decreases essentially by a factor of 2, when weak convergence rates 
are available. The intention of this note is to point out this important fact by writing 
down the resulting numbers explicitly. First simulation results are presented in the 
end for a stochastic heat equation in one dimension driven by additive space-time 
white noise, which, to the best of my knowledge, are the first simulation results of 
that type in the literature. The obtained results confirm the theory. 

This work is organized as follows: In Section 2 the multilevel Monte Carlo method 
is recalled including results for the approximation of Hilbert-space-valued random 
variables on arbitrary refinements. SPDEs and their approximations are introduced 
in Section 3 and results for strong and weak errors from [1] are summarized. The 
results from Sections 2 and 3 are combined in Section 4 to a multilevel Monte Carlo 
scheme for SPDEs and the consequences of the knowledge of weak convergence 
rates are outlined. Finally, the theory is confirmed by simulations in Section 5. 


2 Multilevel Monte Carlo for Random Variables 


In this section we recall and improve a convergence and a work versus accuracy 
result for the multilevel Monte Carlo estimator of a Hilbert-space-valued random 
variable from [3]. This is used to calculate errors and computational work for the 
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approximation of stochastic partial differential equations in Section 4. A multilevel 
Monte Carlo method for (more general) Banach-space-valued random variables has 
been introduced in [10], where the author derives bounds on the error for given work. 
Here, we do the contrary and bound the overall work for a given accuracy. 

We start with a lemma on the convergence in the number of samples of a Monte 
Carlo estimator. Therefore let (f2, sf, P) be a probability space and let Y be a random 
variable with values in a Hilbert space (B. (•, -)b) and (Y',i G N) be a sequence of 
independent, identically distributed copies of Y. Then the strong law of large numbers 
states that the Monte Carlo estimator En[Y] defined by 

iv i=\ 

converges P-almost surely to E[T] for N —> +°°. In the following lemma we see that 
it also converges in mean square to E[T] if Y is square integrable, i.e., Y £ Lr(Q\B) 
with 


Lr(£2 ,B) := jv : Q — t B, v strongly measurable, < +°° j . 


where 

II v IIz, 2 ( X2 ; b) := -^[IMIb] 1 / 2 - 

In contrast to the almost sure convergence of £)v[y] derived from the strong law of 
large numbers, a convergence rate in mean square can be deduced from the following 
lemma in terms of the number of samples N € N. 

Lemma 1. For any N € N and for Y £ Lr(Q\B), it holds that 

||E[7] —£*[1111^*) = -E VarfT] 1 / 2 < \\Y\\ LHn , B) . 

The lemma is proven in, e.g., [6, Lemma 4.1], It shows that the sequence of so- 
called Monte Carlo estimators (En[Y],N £ N) converges with rate 0(A^ _1//2 ) in 
mean square to the expectation of Y. 

Next let us assume that (Yp_,£ £ No) is a sequence of approximations of Y, e.g., 
Y( £ V(, where (V), £ £ No) is a sequence of finite dimensional subspaces of B. For 
given L £ Nq, it holds that 


L 

Y L = Y 0 +'£(Ye~ Y e-i) 

1 = l 

and due to the linearity of the expectation that 

L 

E[y L ] = E[y 0 ] + £ E ft'-^-i]- 

e= i 
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A possible way to approximate E[K/J is to approximate E[F^ — F^-i] with the cor¬ 
responding Monte Carlo estimator E N( fY/ : — Yf _ | ] with a number of independent 
samples Nf depending on the level L We set 


L 

E l [Y l } := E No [Y 0 ] + £ E Nt [Y ( - Y t -i\ 

1= l 

and call E L \Yf\ the multilevel Monte Carlo estimator of E[Yj,]. The following lemma 
gives convergence results for the estimator depending on the order of weak conver¬ 
gence of (Y(,£g No) to Y and the convergence of the variance of (Y( — Y(_ x ,l £ N). If 
neither estimates on weak convergence rates nor on the convergence of the variances 
are available, one can use — the in general slower — strong convergence rates. 

Lemma 2. Let Y £ L 2 {Q\B) and let {Y(,£ £ No) be a sequence in L 2 (Q\B), then, 
for L £ No, it holds that 


\\m~E L [Y L ]\\ L2{Q , B) 

<l!E[y-y i ]|| B + ||E[y i ]-£ i [T L ]|| i2(I2;B) 

\ 1/2 

VVarp* —r*_i] I 

< ll F - y illL2(t2 ; B)+ E^r^lly + l|y- y f-lllL2(t3 ; B))j 


= ||E[y-y L ]|| B + /V 0 - lv ar[yo] + £ 


t=\ 


1/2 


where T_i := 0. 

Proof. This is essentially [3, Lemma 2.2] except that the square root is kept outside 
the sum. Therefore it remains to show the property of the multilevel Monte Carlo 
estimator that 

||E[y L ] - E l [Y l ] ||2 2 = Nf 1 Var[y 0 ] + £ Nf' Var[y, -Y e _ i], 

i=i 

To prove this we first observe that 


L 

e [y l \ - e l [y l \ = E[y 0 ] - e No [y 0 ] + £ (E[r* -Y t _ x ]- e N( [y ( - y*_i]) 

i=\ 

and that all summands are independent, centered random variables by the construction 
of the multilevel Monte Carlo estimator. Thus [7, Proposition 1.12] implies that 

E[||E[y L ] -E l [Y l ]\\ 2 b ] 

L 

= E[||E[y 0 ] -fw 0 [Fo]||i] + £ E[||E[yg — -E Nt \Yi-Yt-i\t B \ 
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and Lemma 1 yields the claim. □ 

This lemma enables us to choose for a given order of weak convergence of (Yp,£ £ 
No) and for given convergence rates of the variances of (Yp — Yp_ i, £ £ N) the number 
of samples Np on each level £ £ No such that all terms in the error estimate are 
equilibrated. 

The following theorem is essentially Theorem 2.3 in [3]. While it was previously 
formulated for a sequence of discretizations obtained by regular subdivision, i.e., hp = 
C2~ al: , it is written down for general sequences of discretizations here with improved 
sample sizes. For completeness we include the proof. We should also remark that 
the convergence with basis 2 by regular subdivision in [3] is useful and important 
for SPDEs since most available approximation schemes that can be implemented are 
obtained in that way. Nevertheless, it is also known that the refinement with respect 
to basis 2 is not optimal for multilevel Monte Carlo approximations. Therefore it 
makes sense to reformulate the theorem in this more general way. 

Theorem 1. Let ( ap,£ £ No) be a decreasing sequence of positive real numbers 
that converges to zero and let ( Yp,£ £ No) converge weakly to Y, i.e., there exists a 
constant C\ such that 

||E[7 —T?]||b < Ci ap 

for t £ No- Furthermore assume that the variance of (Yp — Yp_ \ , £ £ N) converges 
with order 2 rj £ [ 0 , 2 ] with respect to (ap,£ £ No), i.e., there exists a constant C 2 such 
that 

Var [Y t -Yp-i]<C 2 af, 

and that Var[io] = C 3 . For a chosen level L £ No, set Np := | ~af 2 a 2rl £ 1+e ~\, £ = 
1 ,...,L, e > 0, and Nq := [a^ 2 ], then the error of the multilevel Monte Carlo 
approximation is bounded by 

||E[T]-£ L [T L ]|| i2(i2:B) < (C 1 + (C 3 +C 2 C(1+£)) 1 / 2 )«l, 

where £ denotes the Riemann zeta function, i.e., ||E[F] — E l \Yi\^ L 2 <q. b \ has the same 
order of convergence as ||E[F — K/J ||/j. 

Assume further that the work Wp of one calculation ofYp — Yp_\, £ > 1, is bounded 
by C/p aj K for a constant C 4 and K > 0, that the work to calculate Yq is bounded by 
a constant C 5 , and that the addition of the Monte Carlo estimators costs C^af S for 
some 8 > 0 and some constant CThen the overall work Wl is bounded by 

W L < af 2 (C 5 + C 4 £ aJ {K ~ 211] £ 1+£ ) +C 6 a L s . 

e=i 

If furthermore (ap,£ £ No) decreases polynomially, i.e., there exists a > 1 such that 
ap = 0 (a~ e ), then the bound on the computational work simplifies to 

O (a L max{2 ’ S} ) ifK< 277 , 

0 (ma x{a l } 2+K ~ ri) L 2 +e ,af 5 }) if K > 2 rj. 


L ~ 
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Proof. First, we calculate the error of the multilevel Monte Carlo estimator. It holds 
with the made assumptions that 

N^\/ar[Yo]<C 3 a 2 L 

and, for t = 1 ,..., L, that 

Nf 1 Var[Ye-Y e _ l \<C 2 alaJ 2ll r ( ' l+e) af 1 =C 2 ci 2 L r ( ' l+s ' ) . 

So overall we get that 

EVVarft-!*_!] < C 2 ai £ < C 2 aiC(l+e), 

e=i i= t 

where £ denotes the Riemann zeta function. To finish the calculation of the error we 
apply Lemma 2 and assemble all estimates to 

||E[T] < (Ci + (C 3 +C 2 £(1 +e)) 1/2 )az,. 

Next we calculate the necessary work to achieve this error. The overall work consists 
of the work '(Pf to compute Yf — ¥(_i times the number of samples N t on all levels 
t = 1,..., L, the work "tP^ on level 0, and the addition of the Monte Carlo estimators 
in the end. Therefore, using the observation that N( ~ af 2 a 2rl £ 1+£ , 1= 1,... ,L, and 
Nq ~ a L 2 with equality if the right hand side is an integer, we obtain that 

L 

IPl < C 5 N 0 + C 4 £ N e aJ K + C 6 a L s 

£=1 

L 

< C 5 af 2 + C 4 £ a L 2a ? fi +e aJ K + C 6 a L 5 

i=\ 

< a L 2 (C 5 +C 4 £ aJ {K ~ 2r,) £ 1+e ) +C 6 a L s , 

e=i 

which proves the first claim of the theorem on the necessary work. 

If K" < 2rj and additionally (cif.i £ No) decreases polynomially, the sum on the 
right hand side is absolutely convergent and therefore 

W L <(C 5 +C 4 C)a L 2 +C 6 a L S =0{a L max{2 ' S} ). 

For K > 2r\, it holds that 

7P l < af 2 (C 5 +C 4 af (K ~ 2r,) L 2+e ) +C 6 a L S 
= 0(max{a L ^ 2+K L 2+e ,af S }). 


This finishes the proof of the theorem. 


□ 
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We remark that the computation of the sum over different levels of the Monte 
Carlo estimators does not increase the computational complexity if Yp £ Vp for all 
t € No and (Vp,£ £ No) is a sequence of nested finite dimensional subspaces of B. 


3 Approximation of Stochastic Partial Differential Equations 

In this section we use the framework of [1] and recall the setting and the results 
presented in that manuscript. We use the different orders of strong and weak conver¬ 
gence of a Galerkin method for the approximation of a stochastic parabolic evolution 
problem in Section 4 to show that it is essential for the efficiency of multilevel Monte 
Carlo methods to consider also weak convergence rates and not only strong ones as 
was presented in [ 6 ]. 

Let (H,(-,-) h ) be a separable Hilbert space with induced norm II -Ik and Q:H~^ 
H be a self-adjoint positive semidehnite linear operator. We define the reproducing 
kernel Hilbert space = Q l / 2 (H) with inner product = (£T 1/2 -, Q~ 1/2 -)h, 

where Q 12 denotes the square root of the pseudo inverse of Q which exists due 
to the made assumptions. Let us denote by Lns(Ji?;H) the space of all Hilbert- 
Schmidt operators from Jf? to //, which will be abbreviated by Lhs in what follows. 
Furthermore L(H) is assumed to be the space of all bounded linear operators from 
H to H. Finally, let (Q. .e/, (^t)t>0,P) be a filtered probability space satisfying the 
“usual conditions’’ which extends the probability space already introduced in Section 2. 
The corresponding Bochner spaces are denoted by U’iQ'JI), p > 2, with norms given 
by || • || u>(q-h) = E[|| • ||#] 1//p . In this framework we denote by W = (W(t),t > 0) a 
(=^;);> 0 -ndapted g-Wiener process. Let us consider the stochastic partial differential 
equation 

dX(f) = (AX(f)+F(X(f)))df+dW(f) (1) 

as Hilbert-space-valued stochastic differential equation on the finite time inter¬ 
val (0,7“], T < +°°, with deterministic initial condition X (0) = Xq. We pose the 
following assumptions on the parameters, which ensure the existence of a mild 
solution and some properties of the solution which are necessary for the derivation 
and convergence of approximation schemes. 

Assumption 1. Assume that the parameters of (1) satisfy the following: 

1. Let A be a negative definite, linear operator on H such that (— A )~ 1 £ L(H) and A 
is the generator of an analytic semigroup (S(t),t > 0) on H. 

2. The initial value Xq is deterministic and satisfies (-A)^X o £ H for some j3 £ [0,1], 

3. The covariance operator Q satisfies ||(— A)^~ 1 )/ 2 ||i HS < +°° for the same ft as 
above. 

4. The drift F : H —P H is twice differentiable in the sense that F £ Cj l (H\f-l) (T 
C) 2 (//;// '), where // 1 denotes the dual space of the domain of (—A) 1 / 2 . 

Under Assumption 1, the SPDE (1) has a continuous mild solution 
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X(t)=S(t)X. 0+ f S(t-s)F(X(s))ds+ [‘S(t-s)dW(s) (2) 

Jo Jo 

for t £ [0, T], which is in L P {Q\H) for all p > 2 and satisfies for some constant C 
that 

sup ||2f(f)|| iP ( i 2 ;ff ) < C(1 + IIXqIIh). 
ie[o,r] 

We approximate the mild solution by a Galerkin method in space and a semi-implicit 
Euler-Maruyama scheme in time, which is made precise in what follows and spares 
us the treatment of stability issues. Therefore let (VpJ. £ No) be a nested family of 
finite dimensional subspaces of V with refinement level £ £ No, refinement sizes 
(hp,£ £ No), associated //-orthogonal projections Pp, and norm induced by H. For 
£ £ No, the sequence (Vp,£ £ No) is supposed to be dense in H in the sense that for 
all (f> £ H, it holds that 

,lim U-Pp<S>\\h=Q- 

We denote the approximate operator by ,4/ : Vp —? Vp and specify the necessary 
properties in Assumption 2 below. Furthermore let (@",n £ No) be a sequence of 
equidistant time discretizations with step sizes At", i.e., for n £ No, 

0" := {t n k = Afk , k = 0, ... ,N{n)}, 

where N(n) = T /At n , which we assume to be an integer for simplicity reasons. We 
define the fully discrete semigroup approximation by Sp_ n := (I At"Apy y Pp and 
assume the following: 

Assumption 2. The linear operators Ap :Vp —>Vp, £ £ No, and the orthogonal projec¬ 
tors Pp : H —> Vp, £ £ No, satisfy for all k = 1,..., N(n) that 

ll(-^) p stM< c feT p 


for p > 0 and 

\\(-A t )-Pp t (-A)P\\ m <C 

for p £ [0,1 /2] uniformly in £,n £ No- Furthermore they satisfy for all d £ [0,2], 
p e [—0,min{l,2— 0}], and k = 1,... ,N(n), 

||(5(4')-5t j;i )(-A)^ /2 || M// ) < + (^r") e/2 )(4')- (e ^ } / 2 . 

The fully discrete semi-implicit Euler-Maruyama approximation is then given in 
recursive form for t k = At"k £ 0" and for £ £ No by 

with Xpp n (0) := PpXf), which may be rewritten as 

XtA*k) = S i„ X 0 +At" t S \~n i+lp M n j-1 )) + t f! S \T l d ^)- (3) 

j= 1 j=l J, ]-l 
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We remark here that we do not approximate the noise which might cause problems in 
implementations. One way to treat this problem is to truncate the Karhunen-Loeve 
expansion of the Q- Wiener process depending on the decay of the spectrum of Q 
(see [5. 2]). 

The theory on strong convergence of the introduced approximation scheme is 
already developed for some time and the convergence rates are well-known and 
stated in the following theorem. 

Theorem 2 (Strong convergence [1]). Let the stochastic evolution equation (1) with 
mild solution X and the sequence of its approximations (X^ n ,£,n £ No) given by (3) 
satisfy Assumptions 1 and 2 for some f3 £ (0,1]. Then, for every y£ (0,j3), there 
exists a constant C > 0 such that for all t, n £ No, 

max m^)-X^)ll L2(a ^<C(hJ+(An r/2 ). 
k=\ . N(n) 

It should be remarked at this point that the order of strong convergence does not 
exceed 1 /2 although we are considering additive noise since the regularity of the 
parameters of the SPDE are assumed to be rough. Under smoothness assumptions 
the rate of strong convergence attains one for additive noise since the higher order 
Milstein scheme is equal to the Euler-Maruyama scheme. Nevertheless, under the 
made assumptions on the regularity of the initial condition Xq and the covariance 
operator Q of the noise, this does not happen in the considered case. 

The purpose of the multilevel Monte Carlo method is to approximate expressions 
of the form E[<p(X(f))] efficiently, where tp : H —> M is a sufficiently smooth func¬ 
tional. Therefore weak error estimates of the form |E[(p(A _ (f^'))] — E[(p(X^ n (f^))]| 
are of importance. Before we state the convergence theorem from [1], we specify the 
necessary properties of (p in the following assumption. 

Assumption 3. The functional <p : H —>M. is twice continuously Frechet differen¬ 
tiable and there exists an integer m > 2 and a constant C such that for all x £ H and 
7 = 1,2, 

l|9 a) W|| L W (ff ;M) < C (1 + IWliT 7 ), 

where |!<P^( x )IIl[,"](//r) I s th e sma H est constant K > 0 such that for all ui,...,u m £ 
H, 

\<P {j) (x)(ui ,.. .,u m ) I < K\\ui ||tf • • • . 

Combining this assumption on the functional (p with Assumptions 1 and 2 on the 
parameters and approximation of the SPDE, we obtain the following result, which 
was proven in [1] using Malliavin calculus. 

Theorem 3 (Weak convergence [1]). Let the stochastic evolution equation (1) with 
mild solution X and the sequence of its approximations (X( n ,l, n £ No) given by (3) 
satisfy Assumptions 1 and 2 for some j3 £ (0,1]. Then, for every (p : H —>• K satisfying 
Assumption 3 and all y £ [0,/l), there exists a constant C > 0 such that for all 
£,n £ N 0 , 
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max |E[<p(X(4 1 ))-<p(^„(4 i ))]l<C(/^+(4f")^). 

k=l,...,N(n) 

An example that satisfies Assumptions 1 and 2 is presented in Section 5 of [1] and 
consists of a (general) heat equation on a bounded, convex, and polygonal domain 
which is approximated with a finite element method using continuous piecewise 
linear functions. 


4 SPDE Multilevel Monte Carlo Approximation 

In the previous section, we considered weak error analysis for expressions of the 
form E[<p(X(f))], where we approximated the mild solution X of the SPDE (1) with 
a fully discrete scheme. Unluckily, this is not yet sufficient to compute “numbers” 
since we are in general not able to compute the expectation exactly. Going back to 
Section 2, we recall that the first approach to approximate the expected value is to do 
a (singlelevel) Monte Carlo approximation. This leads to the overall error given in 
the following corollary, which is proven similarly to [3, Corollary 3.6] and included 
for completeness. 

Corollary 1. Let the stochastic evolution equation (1) with mild solution X and the 
sequence of its approximations (X( n ,Ln £ No) given by (3) satisfy Assumptions 1 
and 2 for some j3 € (0,1]. Then, for every (p : H —> R. satisfying Assumption 3 and 
all 7 G [0, j3), there exists a constant C > 0 such that for all t , n G No, the error of 
the Monte Carlo approximation is bounded by 

A— i ma tv(n) [tP {t'l ) ) )] 11L 2 (A3 ;M) < + (At") 7 + 

for JVeN. 

Proof By the triangle inequality we obtain that 

mcpixitm-ENimAtmkHnm 

< ||E[<p(X(fJt))] — E[cp(Xie > /»(i*)))]Ilz. 2 (i2 jr) 

+ 11 ® [<P (Xe ,n {tjk ) ) ) ] — E N [<p (X(,n (tk ) ) )] IL 2 (A2 ;Mp 

The first term is bounded by the weak error in Theorem 3 while the second one is 
the Monte Carlo error in Lemma 1. Putting these two estimates together yields the 
claim. □ 

The errors are all converging with the same speed if we couple £ and n such that 
hj ~ At" as well as the number of Monte Carlo samples Ne for t £ No by N( ~ h f 4r . 
This implies for the overall work that 


We, = We H ■ Wl ■ W^ = O (hJ d {At n )~ l N t ) = 0(hJ {d+2+4y) 
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where we assumed that the computational work in space is bounded by = 0 (hj d ) 
for some d > 0, which refers usually to the dimension of the underlying spatial 
domain. 

Since we have just seen that a (singlelevel) Monte Carlo simulation is rather 
expensive, the idea is to use a multilevel Monte Carlo approach instead which is 
obtained by the combination of the results of the previous two sections. In what 
follows we show that it is essential for the computational costs that weak convergence 
results are available, since the number of samples that should be chosen according to 
the theory depends heavily on this fact, if weak and strong convergence rates do not 
coincide. 

Let us start under the assumption that Theorem 3 (weak convergence rates) is not 
available. This leads to the following numbers of samples and computational work. 

Corollary 2 (Strong convergence). Let the stochastic evolution equation (1) with 
mild solution X and the sequence of its approximations (X( n ,l, n £ No) given by (3) 
satisfy Assumptions 1 and 2 for some [5 £ (0,1]. Furthermore couple £ and n such 
that At” — h 2 and for L £ No, set No — \h L 2 ^] as well as Np ~ \ h L ~^ If? C l+e ~\ for 
all l = 1,..., L and arbitrary fixed £ > 0. Then, for every (p : H —> R. satisfying 
Assumption 3 and all y £ [0,(3), there exists a constant C > 0 such that for all 
£,n £ No, the error of the multilevel Monte Carlo approximation is bounded by 

max ||E[ 9 (X(^))]-£ i [9(X tiBt (^))]|| t2(fl;a) <C/.2, 
k=\,...,N{n L ) 

where np, is chosen according to the coupling with L. If the work of one computation 
in space is bounded by TFp 1 = 0(hJ d )for i = 0,... ,L and fixed d > 0, which includes 
the summation of different levels, the overall work will be bounded by 

W L = 0{hf {d+2) L 2+e ). 


Proof We first observe that 

max )\\ L 2 (n , H) <C(h Y L + (At n y? 2 )~C-2.hl 

k=l,...,N(n L ) 

by Theorem 2 and the coupling of the space and time discretizations. Furthermore it 
holds that 

max |E[9(X(^))]-E[9(A^ l (^))]| 
k=l,...,N(n L ) 

-z , ma vz JI < P( X ^ i ))“ < ? ) ( X L«L(^ i ))llL 2 (r3;R) 

k=l,...,N(n L ) v ' 

~ C , , ma «z ,\\ X ^l L )~ X Ln L {tk)\\Lfn-H) 

k=l,...,N(n L ) 

< Ch y L . 


since <p is assumed to be a Lipschitz functional (cf. [5, Proposition 3.4]). Furthermore 
Lemma 2 implies that 
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Var[<p(X^(f)) - (p(Xt- hn{ _ j (/))] 

<2(||9(X(0)-9(^(0)lltf (fl; R ) + ll9(^(0)-9(^-i^_ 1 (0)lltf ( flji ) ) 

< C/if. 

Setting a/ = /if, rj = 1, and the sample numbers according to Theorem 1, we obtain 
the claim. □ 

If the additional information of better weak convergence rates from Theorem 3 
is available, the parameters that are plugged into Theorem 1 change, which leads 
for given accuracy to less samples and therefore to less computational work. This 
is made precise in the following corollary and the computations for given accuracy 
afterwards. 

Corollary 3 (Weak convergence). Let the stochastic evolution equation (1) with 
mild solution X and the sequence of its approximations (Xp n ,£,n £ No) given by (3) 
satisfy Assumptions 1 and 2 for some (5 £ (0,1]. Furthermore couple £ and n such 
that At" ~ /if and for L £ No, set No — |7i L 4 f as well as Np ~ [Ti^f^/if. f 1+e ] for 
all £ = 1,... ,L and arbitrary fixed £ > 0. Then, for every (p : H —» K. satisfying 
Assumption 3 and all y £ [0,(3), there exists a constant C > 0 such that for all 
t, n £ No, the error of the multilevel Monte Carlo approximation is bounded by 

max ||E[<p(X(t” L ))] -E L [(p(Xp nL {t n k L ))]|| i 2 (i2;K) < C/if, 

k=l,...,N(n L ) 

where ni is chosen according to the coupling with L. If the work of one computation 
in space is bounded by = 0 (h, <1 ) for £ = £),...,Land fixed d > 0, which includes 
the summation of different levels, the overall work will be bounded by 

W L = 0{hf {d+2+2r) L 2+e ). 

Proof The proof is the same as for Corollary 2 except that we obtain 
max |E \(p{X(t n k L ))} - E[^(X L ,„ L (f” L ))]| < C/if 

k= 1. N(n L ) 

directly from Theorem 3 and therefore set ap = /if, 77 = 1/2, and the sample numbers 
according to these choices in Theorem 1. □ 

If we take regular subdivisions of the grids, i.e., we set, up to a constant, hp := 2“^ 
for £ £ No and rescale both corollaries such that the convergence rates are the same, 
i.e., the errors are bounded by O(/if), we obtain that for a given accuracy £/ on 
level L £ N, Corollary 2 leads to computational work 

*L = O (2 2+ ^£f rf+2)/ >g 2 £d) 


while the estimators in Corollary 3 can be computed in 
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Wl = Q (^T e i ((<,+2)/(2r,+1)|log 2 ei| ) ' 

Therefore the availability of weak convergence rates implies a reduction of the 
computational complexity of the multilevel Monte Carlo estimator which depends 
on the regularity y and d referring to the dimension of the problem in space. For 
large d , the work using strong convergence rates is essentially the squared work that 
is needed with the knowledge of weak rates. Additionally, for all d > 0, the rates 
are better and especially in dimension d = 1 we obtain e L for the weak rates 

versus e^, where y £ (0,1). Nevertheless, one should also mention that Corollary 2 
already reduces the work for 4y > d + 2 compared to a (singlelevel) Monte Carlo 
approximation according to weak convergence rates. The results are put together in 
Table 1 for a quick overview. 



Monte Carlo 

MLMC with strong conv. 

MLMC with weak conv. 

general 

-((</+2)/(2y)+2) 

2 2 + e2^ E H d + 2 )/r\ log2EL \ 

^ e -u^/™ + u |log2£i| 

y=l, omitting const. 

P ~(d/2+ 3) 

£i lrf+2) |log 2 £ t | 

e L wi+2) |log 2 e L | 


Table 1 Computational work of different Monte Carlo type approximations for a given precision 
£l- 


5 Simulation 

In this section simulation results of the theory of Section 4 are shown, where it has 
to be admitted that the chosen example fits better the framework of [6] since we 
estimate the expectation of the solution instead of the expectation of a functional of 
the solution. Simulations that fit the conditions of Section 4 are under investigation. 
Here we simulate similarly to [4] and [5] the heat equation driven by additive Wiener 
noise 

dX(t)=AX(t)dt + dW(t) 

on the space interval (0,1) and the time interval [0,1] with initial condition X (0,x) = 
sin(7Lv) for x £ (0,1). In contrast to previous simulations, the noise is assumed to be 
white in space to reduce the strong convergence rate of the scheme to (essentially) 1 /2. 
The solution to the corresponding deterministic system with u(t) = E[A(f ) for 
t G [0,1] 

dn(f) = Au(t)dt 

is in this case u(t,x) = exp(— n 2 t) sin(7Dc) for x £ (0,1) and t £ [0,1]. 

The space discretization is done with a finite element method and the hat function 
basis, i.e., with the spaces > 0) of piecewise linear, continuous polynomials 

(see, e.g., [6, Example 3.1]). The numbers of multilevel Monte Carlo samples are 
calculated according to Corollary 2 and Corollary 3 with e = 1 to compare the 
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convergence and complexity properties with and without the availability of weak 
convergence rates. In the left graph in Figure 1, the multilevel Monte Carlo estimator 




Grid points on finest level Grid points on finest level 


Fig. 1 Mean square error of the multilevel Monte Carlo estimator with samples chosen according 
to Corollary 2 and Corollary 3. 


E L [Xl, 2 l( 1)] was calculated for L = 1,... ,5 for available weak convergence rates as 
in Corollary 3 while just for L = 1,..., 4 in the other case to finish the simulations in 
a reasonable time on an ordinary laptop. The plot shows the approximation of 

\\E[X(l)}-E L [X L , L mH=(l (exp(—7T 2 ) sin(7Tx) — E L [X L 2L (1 ,x)}) 2 dxj / , 
i.e., 

ei(X L n l) ■= (- X!( ex P(~ 7r2 ) sin ( 7rA 4)-£ ,L [^L.2L(l,^)]) 2 J . 

Km k= 1 7 

Here, for all levels L = 1,..., 5, m = 2 5 + 1 and xj c ,k= 1,..., m, are the nodal points 
of the finest discretization, i.e., on level 5 respectively 4. The multilevel Monte 
Carlo estimator E l \Xl, 2 l] is calculated at these points by its basis representation 
for L = 1,...,4, which is equal to the linear interpolation to all grid points x^, 
k= 1,...One observes the convergence of one multilevel Monte Carlo estimator, 
i.e., the almost sure convergence of the method, which can be shown using the mean 
square convergence and the Borel-Cantelli lemma. In the graph on the right hand 
side of Figure 1, the error is estimated by 


,| IV \ 1/2 

cn(X l , 2 l) ■= Xj ei ( x l,2l)j ) 

where (X[ 2L , i = 1,... ,N) is a sequence of independent, identically distributed sam¬ 
ples of Xl ^l and N = 10. The simulation results confirm the theory. In Figure 2 the 
computational costs per level of the simulations on a laptop using matlab are shown 
for both frameworks. It is obvious that the computations using weak convergence 
rates are substantially faster. One observes especially that the computations with 
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Fig. 2 Computational work of the multilevel Monte Carlo estimator with samples chosen according 
to Corollary 2 and Corollary 3. 


weak rates on level 5 take less time than the ones with strong rates on level 4. The 
computing times match the bounds of the computational work that were obtained in 
Corollary 3 and Corollary 2. 

Finally, Figure 1 and Figure 2 include besides e = 1 also simulation results for 
the border case e = 0 in the choices of sample sizes per level. One observes in the 
left graph in Figure 1 that the variance of the errors for e = 0 in combination with 
Corollary 2 is high, which is visible in the nonalignment of the single simulation 
results. Furthermore the combination of Figure 1 and Figure 2 shows that e = 0 
combined with Corollary 3 and e = 1 with Corollary 2 lead to similar errors, but that 
the first choice of sample sizes is essentially less expensive in terms of computational 
complexity. Therefore the border case £ = 0, which is not included in the theory, 
might be worth to consider in practice. 
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